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Abstract. Many Control Systems are indeed Software Based Control Systems, 
i.e. control systems whose controller consists of control software running on a 
microcontroller device. This motivates investigation on Formal Model Based De- 
sign approaches for automatic synthesis of control software. 
Available algorithms and tools (e.g., QKS) may require weeks or even months of 
computation to synthesize control software for large-size systems. This motivates 
search for parallel algorithms for control software synthesis. 
In this paper, we present a map-reduce style parallel algorithm for control soft- 
ware synthesis when the controlled system (plant) is modeled as discrete time lin- 
ear hybrid system. Furthermore we present an MPI-based implementation PQKS 
of our algorithm. To the best of our knowledge, this is the first parallel approach 
for control software synthesis. 

We experimentally show effectiveness of PQKS on two classical control synthe- 
sis problems: the inverted pendulum and the multi-input buck DC/DC converter. 
Experiments show that PQKS efficiency is above 65%. As an example, PQKS 
requires about 16 hours to complete the synthesis of control software for the 
pendulum on a cluster with 60 processors, instead of the 25 days needed by the 
sequential algorithm in QKS. 

1 Introduction 

Many Embedded Systems are indeed Software Based Control Systems (SBCSs). An 
SBCS consists of two main subsystems: the controller and the plant. Typically, the 
plant is a physical system consisting, for example, of mechanical or electrical devices 
whereas the controller consists of control software running on a microcontroller. In an 
endless loop, at discrete time instants (sampling), the controller after an Analog-to- 
Digital (AD) conversion (quantization), reads sensor outputs from the plant and after a 
Digital-to-Analog (DA) conversion, sends commands to plant actuators. The controller 
selects commands in order to guarantee that the closed-loop system (that is, the system 
consisting of both plant and controller) meets given safety and liveness specifications 
(System Level Formal Specifications). 

Software generation from models and formal specifications forms the core of Model 
Based Design of embedded software Qj. This approach is particularly interesting for 
SBCSs since in such a case system level (formal) specifications are much easier to 
define than the control software behavior itself. 



1.1 Motivations 



In 121 it is presented an algorithm, along with a tool QKS, that returns correct-by- 
construction control software starting from the following specifications: i) a formal 
model of the controlled system, modeled as a Discrete Time Linear Hybrid System 
(DTLHS), ii) safety and liveness requirements (goal region) and iii) number b of bits 
for AD conversion. 

To this aim, in QKS first computes a suitable finite discrete abstraction (control 
abstraction [2]) H of the DTLHS plant model H, where Ti depends on the quantization 
schema (i.e. number of bits b needed for AD conversion) and it is the plant as it can be 
seen from the control software after AD conversion. Then, given an abstraction G of the 
goal states G, it is computed a controller K that starting from any initial abstract state, 
drives H to G regardless of possible nondeterminism. Control abstraction properties 
ensure that K is indeed a (quantized representation of a) controller for the original 
plant H.. Finally, the finite automaton K is translated into control software (C code). The 
whole process is summarized in FigQ] 

While effective on moderate-size systems, QKS computation time is exponential 
in b, thus resulting in a bottleneck when synthesizing controllers for larger systems. 
This motivates search of parallel versions of QKS synthesis algorithm. In fact, from a 
computational point of view, the most critical step of QKS is the control abstraction H 
generation (which is responsible for more than 95% of the overall computation, see 12). 
This stems from the fact that Ti is computed explicitly, by solving a Mixed Integer Lin- 
ear Programming (MILP) problem for each triple (x, u, x') (where x,x' are abstract 
states of H and u is an abstract action of Ti). Since the number of abstract states is 
2 b , being b the number of bits needed for AD conversion of all variables in the plant, 
we have that QKS computation time is exponential in 26 + b u (being b u the number of 
bits needed to encode actions). In QKS suitable optimizations reduce the complexity to 
be exponential in b + b u . However, for large-sized system this may lead to unaccept- 
able computation time, even considering that the finite state abstraction generation is 
an off-line computation. In large-sized embedded systems this computation may take 
up to weeks or even months since b may be very large for two typical reasons. First, 
since each plant continuous state variable needs to be quantized, the number of bits 
is necessarily high when the plant model consists of many continuous variables, such 
as, for example, in an air traffic control system [3 j. Second, controllers synthesized by 
considering a finer quantization schema usually have a better behaviour with respect 
to non-functional requirements, such as ripple and set-up time. Therefore, when a high 
precision is required, a small sampling time and a large number of quantization bits 
must be considered. 

As an example, experimental results show that QKS takes nearly one month (25 
days) of CPU time to synthesize the controller for a 1 3 bits quantized inverted pendulum 
(which is described by only two continuous state variables, see Sect. 15.1b . Moreover, 
99% of those 25 days of computation is devoted to control abstraction generation. This 
may result in a loss in terms of time-to-market in control software design when QKS is 
used. 
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Fig. 1. Control Software Synthesis Flow. 



1.2 Main Contributions 



To overcome such obstruction, in this paper we show that the QKS control abstraction 
procedure may be organized as an embarrassingly parallel task. Map-Reduce [4] is a 
(LISP inspired) programming paradigm advocating a form of embarassing parallelism 
for effective massive parallel processing. An implementation of such an approach is 
in Hadoop (e.g., see O). The effectiveness of the Map-Reduce approach stems from 
the minimal communication overhead of embarassing parallelism. This motivates our 
goal of looking for a map-reduce style parallel algorithm for control software synthesis 
from system level Hence, we design a parallel version of QKS, that is inspired to the 
map-reduce programming style and that we call Parallel QKS (PQKS in the follow- 
ing), formal specifications. PQKS is actually implemented using MPI (Message Pass- 
ing Interface) and it is designed to exploit the computational power available in modern 
computer clusters (distributed memory model). Such an algorithm will be presented 
in Sect. [4] after a discussion of the basic notions needed to understand our approach 
(Sects. [2]i and the description of the standalone (i.e. serial) algorithm of QKS (Sect. [3]). 

We assess the effectiveness of PQKS by running it on two widely used embed- 
ded system, which are challenging examples for the automatic synthesis of correct-by- 
construction control software: the multi-input buck DC-DC converter [6 1 and the in- 
verted pendulum |7| benchmarks. Namely, experimental results on the above described 
benchmarks will be discussed in Sect. [5] also showing that we achieve a nearly linear 
speedup w.r.t. QKS, with efficiency above 65%. As an example, PQKS requires about 
16 hours to complete the above mentioned synthesis of the 13-bits pendulum on a clus- 
ter with 60 processors, instead of the 25 days of QKS. 

1.3 Related Work 

In the literature there are many works presenting algorithms (and tools) for the au- 
tomatic synthesis of control software under various assumptions. As an example, the 
works in [8 9 10 7 1 propose algorithms expressing the input model in discrete time and 
are able to manage infinite-state systems like those arising from hybrid systems, our 
focus here. However, no one of such approaches has a parallel version of any type, thus 
the algorithms they propose suffer of the same bottleneck as QKS. 

A parallel algorithm for control software synthesis has been presented in ifTTIl , where 
however non-hybrid systems are addressed, control is obtained by Monte Carlo simu- 
lation and quantization is not taken into account. Moreover, note that in literature "par- 
allel controller synthesis" often refers to synthesizing parallel controllers (e.g., see 1 12 1 
and [13 1 and citations thereof), whilest here we parallelize the (offline) computation 
required to synthesize a standalone controller. 

As discussed in Sect. 11.11 the present paper builds mainly upon the tool QKS pre- 
sented in [2|. Other works about QKS comprise the following ones. In lfl4l it is shown 
that expressing the input system as a linear predicate over a set of continuous as well 
as discrete variables is not a limitation on the modeling power. In lfl5l it is shown how 
non-linear systems may be modeled by using suitable linearization techniques. The 
paper in [16] addresses model based synthesis of control software by trading system 
level non-functional requirements (such us optimal set-up time, ripple) with software 



non-functional requirements (its footprint, i.e. size). The procedure which generates the 
actual control software (C code) starting from a finite automaton of a control law is 
described in ifTTl . In |fl8l it is shown how to automatically generate a picture illustrat- 
ing control software coverage. Finally, in [19] it is shown that the quantized control 
synthesis problem underlying QKS approach is undecidable. As a consequence, QKS 
is based on a correct but non-complete algorithm. Namely, QKS may return one of the 
following results: i) Sol, in which case a correct-by-construction control software is 
returned; ii) NoSol, in which case no controller exists for the given specifications; iii) 
Unk, in which case QKS was not able to compute a controller, but a controller may 
exist. 

Summing up, to the best of our knowledge, no previous parallel control software 
synthesis from formal specifications has been published. 

2 Background on DTLHS Control Software Synthesis 

To make this paper self-contained, in this section we briefly summarize previous work 
on automatic generation of control software for Discrete Time Linear Hybrid System 
(DTLHS) from System Level Formal Specifications. 

As shown in Fig. Q] we model the controlled system (i.e. the plant) as a DTLHS 
(Sect. I2.41 i. that is a discrete time hybrid system whose dynamics is modeled as a 
guarded (linear) predicate (Sect. [XT) over a set of continuous as well as discrete vari- 
ables. The semantics of a DTLHS is given in terms of a Labeled Transition Systems 
(LTS, Sect. 12. 2\ . Given a DTLHS plant model H, a set of goal states G (liveness speci- 
fications) and an initial region I, both represented as linear predicates, we are interested 
in finding a restriction K of the behaviour of % such that in the closed loop system all 
paths starting in a state in / lead to G after a finite number of steps. Finding K is the 
DTLHS control problem (Sect. 12. 5t that is in turn defined as a suitable LTS control 
problem (Sect. \23\ . Since we want to output a control software, we are interested in 
controllers that take their decisions by looking at quantized states, i.e. the values that 
the control software reads after an AD conversion. To this aim, the solution of a quan- 
tized control problem (Sect. 12.61 ) is computed by first generating a discrete abstraction 
of H, called control abstraction (Sect. [3] step 1 in Fig. [TJ, then by applying to such 
control abstraction known techniques in order to generate a controller (step 2 in Fig. [U, 
and finally synthesizing a control software (step 3 in Fig.Q]). Our main contribution in 
this paper is in the control abstraction generation, thus we will focus this section on the 
basic notions to understand definition and computation of control abstractions (Sect. [3). 

2.1 Predicates 

We denote with [n] an initial segment {1, . . . , n} of the natural numbers. We denote 
with X = [xi, . . . , x n ) a finite sequence of variables that we may regard, when conve- 
nient, as a set. Each variable x ranges on a known (bounded or unbounded) interval V x 
either of the reals (continuous variables) or of the integers (discrete variables). We de- 
note with T>x the set Ilzex ^x - Boolean variables are discrete variables ranging on the 
set B = {0, 1}. To clarify that a variable x is continuous (resp. discrete, resp. boolean) 



we may write x r (resp. x d , x b ). Analogously X r (X d , X b ) denotes the sequence of real 
(discrete, boolean) variables in X. Unless otherwise stated, we suppose T>x^ = K' xr ' 
and2? X d =l) xd \.\f x is a boolean variable, we write x for (1 — x). 

A linear expression over a list of variables X is a linear combination of variables in 
X with rational coefficients. A linear constraint over X (or simply a constraint) is an 
expression of the form L(X) < b, where L(X) is a linear expression over X and 6 is a 
rational constant. In the following, we also write L(X) > b for —L(X) < —b. 

Predicates are inductively defined as follows. A constraint C(X) over a list of 
variables X is a predicate over X. If A(X) and B(X) are predicates over X, then 
(A(X) A B(X)) and (A(X) V B(X)) are predicates overX. Parentheses maybe omit- 
ted, assuming usual associativity and precedence rules of logical operators. A conjunc- 
tive predicate is a conjunction of constraints. For conjunctive predicates we will also 
write: L(X) = b for ((L(X) < b) A (L(X) > b)) and a < x < b for x > a A x < b, 
where x G X. 

Given a constraint C(X) and a fresh boolean variable (guard) y ^ X, the guarded 
constraint y -> C(X) (if y then C(X)) denotes the predicate ((y = 0) V C(X)). 
Similarly, we use y — > C(X) (if not y then C(X)) to denote the predicate ((y = 
1) V C(X j). A guarded predicate is a conjunction of either constraints or guarded 
constraints. 

2.2 Labeled Transition Systems 

A Labeled Transition System (LTS) is a tuple S = (5, A, T) where S is a (possibly 
infinite) set of states, A is a (possibly infinite) set of actions, and T : S x A x S — > 
B is the transition relation of S. We say that T (and S) is deterministic if T(s, a, s') A 
T(s,a,s") implies s' = s", and nondeterministic otherwise. Let s G S and a G A 
We denote with Adm(5, s) the set of actions admissible in s, that is Adm(£, s) = {a G 
.4 | 3s' : T(s, a, s')} and with Img(<S, s, a) the set of next states from s via a, that 
is Img(<S, s,a) = {s' G S \ T(s,a,s')}. We call self-loop a transition of the form 
T(s, a, s). A run or path for an LTS S is a sequence 7r = so, ao, Si,a\, S2, ci2, ■ ■ ■ of 
states St and actions a t such that Vt > T(s t , at, st+i). The length | tt | of a finite run 7r 
is the number of actions in n. Sometimes st (resp. at) will be denoted by 7r( s ) (t) (resp. 
ir (A \t)). 

2.3 LTS Control Problem and Solutions 

A controller for an LTS S is used to restrict the dynamics of S so that all states in the 
initial region will reach the goal region. In the following, we formalize such a concept 
by defining solutions to an LTS control problem. In what follows, let S = (S, A, T) be 
an LTS, I, G C S be, respectively, the initial and goal regions of S. 

Definition 1. A controller for S is a function K : S x A — >• B such that Ms G S,Va G A 
ifK(s, a) then 3s' T(s, a, s'). IfK(s, a) holds, we say that the action a is enabled by 
K in s. 

The set of states {s G S | 3a K(s, a)} for which at least an action is enabled is 
denoted by dom(K). 



S( K *) denotes the closed loop system, that is the LTS (S, A, T^), where T^ K \s, 
a, s') = T(s, a, s') A K(s, a). 

We call a path tt fullpath if either it is infinite or its last state tt^ (\tt\) has no suc- 
cessors (i.e. Adm(5, tt^ (\tt\)) = 0). Path(s, a) denotes the set of fullpaths starting 
in state s with action a, i.e. the set of fullpaths tt s.t. tt^(0) = s and 7r^(0) = 
a. Given a path tt in S, we define j(S,TT,G) as follows. If there exists n > s.t. 
7T^(n) G G, then j(S,n, G) = min{n | n > A jr( s '(n) G G}. Otherwise, 
j(iS, 7r, G) = +oo. We require n > since our systems are nonterminating and each 
controllable state (including a goal state) must have a path of positive length to a goal 
state. Taking sup = +oo, the worst case distance of a state s from the goal region G 
is J(iS, G, s) = sup{j(5, G, 7r) 7r <E Path(s, a), a <E Adm(<S, s)}. 
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Fig. 2. The LTS Si in ExamplefT] 
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Fig. 3. The LTS 52 in ExampleQ] 

Definition 2. An LTS control problem is a triple V = (S, I, G). A strong solution (or 

simply a solution) to V is a controller K for S, such that I C &om.{K) and for all 
s G dom(if), J{S( K \G,s) is finite. 

A solution K* to V is optimal if for all solutions K to V, for all s G S, we have 
J{S( K *\G,s) <J(S^ K \G,s). 

Example 1. Let Si = (Si,Ai,Ti) be the LTS in Fig.|2]and let S 2 = (S 2 , A 2l T 2 ) be 
the LTS inFig.|3] S*i is the integer interval [-1, 2] and S 2 = [-2, 5]. Ai = A 2 = {0, 1} 
and the transition relations T\ and T 2 are defined by all solid arrows in the pictures. Let 
I\ = Si, Is = S 2 and let G = {0}. There is no solution to the control problem 
(<Sl,ii, G). Because of the self-loops of the state 1, we have that both j(Si, G, 1, 0) = 
+oo and j(Si, G, 1, 1) = +oo. The controller K 2 defined by K 2 (s, a) = ((s = 1 V s = 



2) A a = 1) V (s ^ 1 A s ^ 2 A a = 0) is an optimal strong solution for the control 
problem (<S 2 , G). 

2.4 Discrete Time Linear Hybrid Systems 

In this section we introduce the class of discrete time Hybrid Systems that we use as 
plant models, namely Discrete Time Linear Hybrid Systems (DTLHSs for short). 

Definition 3. A Discrete Time Linear Hybrid System is a tuple H = (X, U, Y, N) 

where: 

- X = X r U X d is a finite sequence of real (X r ) and discrete (X d j present state 
variables. We denote with X' the sequence of next state variables obtained by dec- 
orating with ' all variables in X. 

— U = U r U U d is a finite sequence of input variables. 

— Y = Y r U Y d is a finite sequence of auxiliary variables that are typically used to 
model modes (e.g., from switching elements such as diodes) or "local" variables. 

- N(X, U, Y, X 1 ) is a guarded predicate over lUUUFUX' defining the transition 
relation ( next state ). 

The semantics of DTLHSs is given in terms of LTSs. 

Definition 4. Let 'H = (X, U,Y, N)be a DTLHS. The dynamics ofH is defined by the 
Labeled Transition System LTS(7{) = (T>x, T>u, N) where: N : T>x x T>u x T>x — > 
B is a function s.t. N(x, u, x') = 3 y E 2?y N(x, u, y, x 1 ). A state x for H is a state x 
for LTS(H) and a run (or path,) for H is a run for LTS(H) (Sect, \2.2h 

2.5 DTLHS Control Problem 

A DTLHS control problem {%, /, G) is defined as the LTS control problem (LTS("H), 7, 
G). To accommodate quantization errors, always present in software based controllers, 
it is useful to relax the notion of solution by tolerating an arbitrarily small error e on the 
continuous variables. 

Let e > be a real number, W C E™ x Z m . The e-relaxation of W is the ball 
of radius e B e (W) = {(zi, ...z n ,qi,... q m ) \ . . . , x n , qi, . . . q m ) G W and 

Vi G [n] \zi -Xi\< e}. 

Definition 5. Let (Ti, I, G) be a DTLHS control problem and e be a nonnegative real 
number. An e solution to (H, I, G) is a solution to the LTS control problem (LTS(H), 
I, B £ (G)). 

Example 2. Let T be the positive constant i/io (sampling time). We define the DTLHS 
% = ({x}, {u}, 0, N) where a; is a continuous variable, u is boolean, and N(x, u, x') 
= [u -> x' = x + ( 5 /4 - x)T] A [u -> x' = x + (x - 7 /£)T\. Let I(x) = -1 < x < 5 /2 
and G(x) = x = 0. Finally, let V be the control problem (H, I, G). A controller 
may drive the system near to the goal G, by enabling a suitable action in such a way 
that x 1 < x when x > and x' > x when x < 0. However the controller K(x, u) 



defined by K(x,u) = (-1 < x < Am) V (0 < x < 2 A u) V (1 < 
x < 5 /2 A m) is not a solution, because it allows infinite paths to be executed. Since 
K( 5 /4, 0) and N( 5 /4, 0, 5 / 4 ) hold, the closed loop system T-L^ K ' may loop forever along 
the path 5/4, 0, 5 /4, . . .. K' defined by K'(x, u) = (-1 < x < A u) V (0 < a; < 
3 /2 A u) V ( 3 /2 < a; < 5 /2 A u) is a solution to "P. 

2.6 Quantized Control Problem 

As usual in classical control theory, quantization (e.g., see ll20l ) is the process of ap- 
proximating a continuous interval by a set of integer values. In the following we for- 
mally define the quantized feedback control problem for DTLHSs. 

A quantization function 7 for a real interval / = [a, b] is a non-decreasing func- 
tion 7 : / h- > Z s.t. 7(1) is a bounded integer interval. We will denote 7(1) as I = 
[7(a), 7(6)]. The quantization step of 7, notation ||7||, is defined as sup{ z | w,z e 
I A 7(u>) = j(z)}. For ease of notation, we extend quantizations to integer intervals, 
by stipulating that in such a case the quantization function is the identity function. 

Definition 6. Let U = (X, U, Y, N) be a DTLHS, and W = XUUUY. A quantization 
Q for % is a pair (A, r), where: 

- A is a predicate over W that explicitely bounds each variable in W (i.e., A = 
Awgw aw — w — Pv» w ^ tn aw ' fiw ^ T^w)- For each w £ W, we denote with 
A-iu = [a w , f3 w ] its admissible region and with Aw = Yiwew 

— r is a set of maps F ~ {7^, | w S W and 7^, is a quantization function for A w }. 

Let W = [wi, . . . Wk] and v — [vi,...Vk] G Aw. We write F(v) for the tuple 
[ r y Wl (v\), . . . 7k,,, («fe)]. Finally, the quantization step \\r\\ is defined as sup{ ||7|| | 7 G 

ry. 

A control problem admits a quantized solution if control decisions can be made 
by just looking at quantized values. This enables a software implementation for a con- 
troller. 

Definition 7. Let % = (X, U, Y, N) be a DTLHS, Q = (A, T) be a quantization 
for T-L and V = (TL,I,G) be a DTLHS control problem. A Q Quantized Feedback 
Control (QFC) solution to V is a \F\ solution K{x,u) to V such that K(x,u) = 
K(r(x),r(u)) where K : r(A x ) x r(A v ) -> B. 

Example 3. Let V, K and K 1 be as in Ex.|2] Let us consider the quantizations Qi = 
(Ai,Ti), where A\ — 1,F\ = {7^} and 7 x (x) = [^J- The set r{A x ) of quantized 
states is the integer interval [—1,2]. No Q QFC solution can exist, because in state 
1 either enabling action 1 or action allows infinite loops to be potentially executed 
in the closed loop system. The controller K' in Ex. [2] can be obtained as a quantized 
controller decreasing the quantization step, for example, by considering the quantization 
Q2 = (A 2 , r 2 ), where A 2 = A x , r 2 = {7J and %(x) = [2x\ . 



3 Control Abstraction Computation 



As explained in Sect. 11.11 the heaviest computation step for QKS is the computation of 
the control abstraction. In this section, we recall the definition of control abstraction, as 
well as how it is computed by QKS. 

In the following, let H = (X, U, Y, N) and Q = (A, T) be, respectively, a DTLHS 
and a quantization for H. We say that an action u G r(Ajj) is Q-admissible in a state 
x G r(Ax) iff, for all plant states x G r~ l (x) and plant actions u G I n_1 (u), and 
for all plant states x' s.t. (x, u, x') is a transition in LTS(H), we have that x' G Ax 
(that is, u always maintain the plant inside its admissible region when starting from x). 
Given this, a Q control abstraction is an LTS H = (r(A x ), r(A v ), N), where for N 
the following holds: i) each abstract transition in N stems from a concrete transition in 
N; ii) each concrete transition in N is faithfully represented by an abstract transition in 
N, whenever it is not a self loop and its corresponding abstract action is Q-admissible; 
iii) if there is no upper bound to the length of concrete paths in LTS(H) which are all 
inside the counter-image of an abstract state then there is an abstract self loop in N. 

We are now ready to give details about control abstraction generation. Namely, 
function minCtrAbs in Alg. Q] given a quantization Q = (A, r) for a DTLHS H = 
(X, U, Y, N), computes a Q-control abstraction (r(A x ), r{A v ), N) of H. Namely, 
for each abstract state x (line [2]i an auxiliary function minCtrAbsAux is called. On its 
side, function minCtrAbsAux decides which transitions, among all the possible ones 
starting from x, fulfills the definition of control abstraction given above, and thus may 
be inserted in N. The checks in lines |2]|3] and [6] and the computation in line@] are per- 
formed by properly defining MILP problems, which are solved using known algorithms 
(available in the GLPK package). 



Algorithm 1 Building control abstractions 

Input: DTLHS H = (X, U, Y, N), quantization Q = (A, Tj, 

function minCtrAbs (H, Q) 

1. iV«- 

2. for all x e r(A x ) do 

3. N <- minCtrAbsAux(H, Q, x, N) 

4. return (r(A x ),r(Au),N) 



4 Parallel Synthesis of Control Software 

In this section we present our novel parallel algorithm for the control abstraction gener- 
ation of a given DTLHS. Such algorithm is a parallel version (for distributed memory 
systems such as computer clusters) of the standalone Alg.Q] In this way, i.e. by improv- 
ing the performance on the bottleneck of QKS, we obtain a significant speedup for the 
whole approach to the synthesis of control software for DTLHSs. 



Algorithm 2 Building control abstractions: auxiliary function 

Input: DTLHS H, quantization Q, abstract state x, partial control abstraction N. 
function minCtrAbsAux (H, Q, x, N) 

1. for all u e r{Au) do 

2. if -i Q-admissible(T-L, Q, x, u) then continue 

3. if selfLoop(H,Q, x,u) then N «- N U {(x, u, x)} 

4. O <- overImg(H, Q, x, it) 

5. for all x e r(0) do 

6. if x ^ x' AexistsTrans (H, Q,x,u,x') then 

7. N^NL){(x,it,x')} 

8. return Jv 



In the following, let H = (X, U, Y, N) and Q = (A, f) be, respectively, the 
DTLHS and the quantization in input to our algorithm for control abstraction gener- 
ation. Moreover, let p be the number of available processors (cores) in the cluster of 
multicore processors. 

Our parallel algorithm rests on observing that, for each abstract state x € r(Ax), 
all statements in the for loop of lines |2]-|7]of Alg. Q]are carried out independently of the 
computations needed for any other abstract state £. This observation allows us to use 
known techniques targeting embarrassingly parallel problems to obtain a significant 
speedup on the control abstraction generation phase. 

To this aim, we use a map-reduce based parallelization technique. Namely, a master 
node of a computer cluster assigns (map) the computations needed for an abstract state 
x (i.e., lines Q~H7]of Alg.[T]i to one of n available nodes (workers), so that each worker 
approximately handles ^ r ^ x ' ) abstract states, thus balancing the parallel workload. 
Once a worker has completed the computations needed for all its abstract states, thus 
obtaining a local control abstraction, it sends back to the master such local control 
abstraction. The master node collects the local control abstractions and compose them 
(with a logical OR operation) in order to obtain the desired global control abstraction. 
Note that, as in embarrassingly parallel tasks, communication only takes place at the 
beginning and at the end of local computations. 

More in detail, as for the mapping phase, our parallel algorithm rests on partitioning 
the (finite) abstract state space r(A x ) into p subspaces r^'P^Ax), ■ ■ ■ , r( p ' p \A x ), 
thus r^(A x ) n rV>P>(A x ) = for % ^ j and \JT^r^(A x ) = r(A x ). We 
design such a partition so that we can locally check if x S r^ l ' p \Ax), i.e. processor 
i may check this by only knowing i, p and x. This allows us to avoid sending to each 
worker the explicit list of abstract states it has to work on, since it suffices to simply 
send i and p (together with the overall input "H and Q) to the worker i. To this aim, let 
ord: r(Ax) — > [\r(A x )\\ be s.t. ord(i) = m iff x is the m-th abstract state in the 
lexicographical ordering of r{A x ). Then, ^'^{Ax) = {x G r(A x ) | ((ord(x) - 1) 
mod p) + 1 = i}, being mod the modulus operation. For an example, see the "MAP" 
arrow in Fig. [4] 

We outline our novel parallel algorithm in Algs. |4](for worker nodes) and[3](for the 
master node). An example of an execution is given in Fig. [4] Note that the master node 
will need to know the number p of available workers (line|4]of Alg. [3j, while workers 
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Fig. 4. Example of execution of the parallel algorithm (see Ex.[4]l. 



also knows their index i (line @] of Alg.@]l, which is sent by the master itself. Alg.|4] 
is similar to Alg.Q] except that only abstract states x € r^' p \Ax) are considered in 
line |2] and that a local control abstraction Ni (represented by an OBDD [21]) is com- 
puted. This entails that the global control abstraction N has to be computed (lines [3]-[4] 
of Alg.0 once all workers have finished their local computation. 

Example 4. Let U = (X, U, Y, N) be a DTLHS and Q a quantization for U s.t. X = 
[xi,x 2 ] and Q discretizes both X\, x 2 with two bits. E.g., this may be the case if x\ G [4] 
is a discrete variable, thus needing two bits for quantization, and x 2 is quantized with 
two bits. Thus, the starting status for function minCtrAbsMaster in Alg.[3]with input 
parameters T-L, Q and 3 (number of workers) is the one depicted in Fig. |4]a, where 
each cell corresponds to an abstract state. Then, function minCtrAbsMaster maps the 
workload among the 3 workers as it is shown in Fig.|4]b, where abstract states labeled 
with i G [3] will be handled by worker i. Fig.|4]c shows how each worker i computes 
its local control abstraction Ni, under the assumption each local control abstraction 
only has the shown transitions. Finally, Fig.|4]d shows how the master rejoins the local 
abstractions in order to get the final one, i.e. N. 



Algorithm 3 Building control abstractions in parallel: master node 
Input: DTLHS T-L, quantization Q, workers number p 
function minCtrAbsMaster (H, Q, p) 

1. for all i £ {1, ... ,p} do 

2. create a worker and send T-L, Q, i and p to it /* map step */ 

3. wait to get Ni, ■ ■ ■ , N P from workers /* reduce step */ 

4. return (r(Ax),r(A u ),^ =1 N j ) 



4.1 Implementation with MPI 

We actually implemented Algs. |4]and[3]in PQKS by using MPI (Message Passing In- 
terface, see ll22ll ). Since MPI is widely used, this allows us to run PQKS on nearly all 
computer clusters. Note that in MPI all nodes (processors) execute the same program 
(SPMD paradigm), each one knowing its rank i and the number of processors p. Thus 



Algorithm 4 Building control abstractions in parallel: worker node 

Input: DTLHS H = (X, U, Y, N), quantization Q = (A, r), index i, workers number p 

function parMinCtrAbs (H, Q, i, p) 

1. Ni*-sa 

2. for all x € r (i - p) (A x ) do 

3. Ni <- minCtrAbsAux(H, Q, x, Ni) 

4. send Ni to the master 



lines QJ[2]of Alg. [3] are directly implemented by the MPI framework. Finally, in our 
implementation the master node is not a separate node, but it actually performs like a 
worker while waiting for local control abstractions from (other) workers. Local control 
abstraction from other workers are collected once the master local control abstraction 
has been performed. This allows us to use p nodes instead of p + 1. 

Note that lines [4] and [3] of, respectively, Algs.|4]and|3]require workers to send their 
local control abstraction to the master. Being control abstractions represented as OB- 
DDs, which are sparse data structures, this step may be difficult to be implemented 
with the a call to MPLSend (as it is usually done in MPI programs), which is designed 
for contiguous data. In our experiments, workers use known algorithms (implemented 
in the CUDD package) to efficiently dump the OBDD representing their local control 
abstraction on the shared filesystem and then performs an MPI_Barrier call in order to 
synchronize all workers with the master. After this, the master node collects local con- 
trol abstraction from workers, by reloading them from the shared filesystem, in order 
to build the final global one. Consequently, when presenting experimental results in 
Sect. [5] we include I/O time in communication time. Note that communication based 
on shared filesystem is very common also in MapReduce native implementations like 
Hadoop0. 

Finally, Alg s. [4] and [3] may conceptually be implemented on multithreaded systems 
with shared memory. However, in our implementation we use GLPK as external library 
to solve MILP problems required in computations inside function minCtrAbsAux (see 
Alg.0. Since GLPK is not thread-safe, we may not implement Algs.|3]and|4]on multi- 
threaded shared memory systems. 

5 Experimental Results 

In this section we present experimental results obtained by using our parallel approach 
on two meaningful and challenging examples for the automatic synthesis of correct- 
by-construction control software, namely the inverted pendulum and multi-input buck 
DC-DC converter. To this aim, we show the gain of the parallel approach with respect 
to the serial algorithm, also providing standard measures such as communication and 
I/O time. 

We implement functions minCtrAbsMaster and parMinCtrAbs of Algs.[3]and|4]in C 
programming language using the CUDD package for OBDD based computations and 
the GLPK package for MILP problems solving, and MPI for the parallel setting and 
communication. The resulting tool, PQKS (Parallel QKS), extends the tool QKS \2\ 



by replacing function minCtrAbs of Alg.Q]with function minCtrAbsMaster of Alg.[3] 
We run PQKS for different number of bits of AD conversion and increasing number of 
processors involved. 

In Sect. 15.11 and 15.21 we will present the DTLHS models of the inverted pendu- 
lum and the multi-input buck DC-DC converter, on which our experiments focus. In 
Sect. 15.31 we give the details of the experimental setting, and finally, in Sect. 15.41 we 
discuss experimental results. 

5.1 The Inverted Pendulum as a DTLHS 
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Fig. 5. Inverted Pendulum with Sta- 
tionary Pivot Point. 



Fig. 6. Multi-input Buck DC-DC converter. 



The inverted pendulum [7 1 (see Fig. [5]l is modeled by taking the angle 9 and the an- 
gular velocity 9 as state variables. The input of the system is the torquing force u-F, that 
can influence the velocity in both directions. Here, the variable u models the direction 
and the constant F models the intensity of the force. Differently from Q, we consider 
the problem of finding a discrete controller, whose decisions may be only "apply the 
force clockwise" (u = 1), "apply the force counterclockwise" (u = —1)", or "do noth- 
ing" (u — 0). The behaviour of the system depends on the pendulum mass m, the length 
of the pendulum I, and the gravitational acceleration g. Given such parameters, the mo- 
tion of the system is described by the differential equation 9 — — sin 9 H -uF. 

I ml 2 

In order to obtain a state space representation, we consider the following normalized 
system, where X\ is the angle 9 and X2 is the angular speed 9: 

±x = x 2 

±2 = j sinxi H — ~p u F ^ 

The discrete time model obtained from the equations in ((TJ by introducing a constant T 
that models the sampling time is: 

(x[ = x\ + Tx 2 ) A (x'n — x 2 + sin a;i + T — j^uF) 

that is not linear, as it contains the function sin x\. A linear model can be find by under- 
and over-approximating the non linear function sin x. In our experiments, we will con- 
sider the linear model obtained as follows. 



First of all, in order to exploit sinus periodicity, we consider the equation X\ — 
+ Da, where y k represents the period in which X\ lies and y a G [— tt, tt^ repre- 
sents the actual x\ inside a given period. Then, we partition the interval [— tt, tt] in four 

intervals: I\ = 



-ft, 

' 2. 



h = 



TT 

-2'° 



TT 

°'2. 



h = 



2' . 



In each interval 



Ii (i € [4]), we consider two linear functions fi(x) and and f i (x), such that for all 
x £ h, we have that ff(x) < sin a; < f^(x). As an example, f^{y a ) — — 0.637j/ Q — 2 
and / 1 _ (i/a) = -0.707t/ a - 2.373. 

Let us consider the set of fresh continuous variables Y r = {y a , y s [ n } and the set 
of fresh discrete variables Y d = {y k , y q , yi,y 2 , 2/3, Ua}, being y 1 ,...,y 4 boolean vari- 
ables. The DTLHS model Xp for the inverted pendulum is the tuple (X, U, Y, N), 
where X = {x\, x 2 } is the set of continuous state variables, U = {u} is the set of in- 
put variables, Y = Y r U Y d is the set of auxiliary variables, and the transition relation 
N(X, U, Y, X') is the following predicate: 



A- 

± ^ i/sin 



ml 2 



(x[ = xi + 1iry q + Tx 2 ) A (x' 2 = x 2 
A Ai£[4] Vi ~* fiiVc) < Vsin < f+{y a ) 
A Aie[4] Vi^Va^^^ Ei£[4] Vi ^ 1 

A x\ = 2iryk + y a A — tt < x[ < tt 

Overapproximations of the system behaviour increase system nondeterminism. Since 
Xp dynamics overapproximates the dynamics of the non-linear model, the controllers 
that we synthesize are inherently robust, that is they meet the given closed loop require- 
ments notwithstanding nondeterministic small disturbances such as variations in the 
plant parameters. Tighter overapproximations of non-linear functions makes finding a 
controller easier, whereas coarser overapproximations makes controllers more robust. 

The typical goal for the inverted pendulum is to turn the pendulum steady to the up- 
right position, starting from any possible initial position, within a given speed interval. 



5.2 Multi-input Buck DC-DC Converter 

The multi-input buck DC-DC converter f6) in Fig. [6] is a mixed-mode analog circuit 
converting the DC input voltage (Vi in Fig. [6j> to a desired DC output voltage (vq in 
Fig. [6j. As an example, buck DC-DC converters are used off-chip to scale down the 
typical laptop battery voltage (12-24) to the just few volts needed by the laptop pro- 
cessor (e.g. Il23ll ) as well as on-chip to support Dynamic Voltage and Frequency Scal- 
ing (DVFS) in multicore processors (e.g. 112410 . Because of its widespread use, control 
schemas for buck DC-DC converters have been widely studied (e.g. see [24 23 1). The 
typical software based approach (e.g. see ||231 ) is to control the switches u\, . . . ,u n in 
Fig.|6](typically implemented with a MOSFET) with a microcontroller. 

In such a converter (Fig.|6]l, there are n power supplies with voltage values Vi , . . . , V r , 
n switches with voltage values w", . . . and current values 7", and n input 

diodes D , . . . , D n _i with voltage values Vq , . . . , v^_ 1 and current iff , . . . , iff-i (in 
the following, we will write vd for Vq and ip> for iff). 



3 In this section we write tt for a rational approximation of it. 



The circuit state variables are ii, and vc- However we can also use the pair i^, vo 
as state variables in the DTLHS model since there is a linear relationship between il, 
vc and vo, namely: vo = J^^ tL + r J\.n Vc- We model the n-input buck DC-DC 
converter with the DTLHS B^= (X, U, Y, N), with X = [i L , v ], U = [iti, . . ., u n ], 

- [V D , V 1 , . . . ,V n _ x , l D , 1\, ■ •-, l n , V X , .. ., V n \. 

The transition relation N is as follows. From a simple circuit analysis (e.g. see Il25l0 
we have the following equations: 

iL = ai,iiL + ai^vo + a 1:3 v D 

VO = fl2,l«L + 02,2^0 + a 2 ,3VD 

where the coefficients aj j depend on the circuit parameters R, Tl, rc, L and C in the 
following way: <zi,i = a lf2 = -\, ai,3 = ~\, a 2 ,i = T-f^h 1 ^ + (?]> 

«2,2 = tTb} 1 ^ + h^' ° 2 ' 3 = ~T r"+R - Using a discrete time model with sampling 
time T (writing x' for x(t + 1)) we have: 

i' L = (1 + Tais)i L + Tai^vo + Tai^v D 
v' = Ta 2 ,iiL + (1 + Ta 2 , 2 )v + Ta 2i3 v D . 

The algebraic constraints stemming from the constitutive equations of the switching 
elements are the following: 



qo —tvr> = RoniD q — > VD = RottiD 

qo -> > g UD < 



•=1 i=l 
1-1 n-1 

A » ^ ^ A * ^ < o 



A u j w i = A ^ u ? = 

3=1 3=1 



VD = Uj + Vi - Vi 

U r r 

VD = V n - Vn 



The typical goal for a multi-input buck is to drive %l and vo within given goal 
intervals. 



5.3 Experimental Setting 

All experiments have been carried out on a cluster with 4 nodes and Open MPI im- 
plementation of MPI. Each node contains 4 quad-core 2.83 GHz Intel Xeon E5440 
processors. Nodes share common filesystem. We have run maximum 15 processors per 
node. 



In the inverted pendulum Xp with force intensity F, as in Q, we set pendulum 
parameters I and m in such a way that f = 1 (i.e. I = g) — 1 (i.e. m = 4). As 
for the quantization, we set A Xl = [—I.Itt, I.Itt] and A X2 — [—4,4], and we define 
Ax F = A Xl x A X2 x A u . 

In the multi-input buck DC-DC converter with n inputs £>„, we set constant param- 
eters as follows: L = 2 ■ 10~ 4 H, r L = 0.1 ft, r c = 0.1 ft, R = 5 ft, C = 5 ■ 1(T 5 F, 
i?on = 51, i? ff = 10 4 fl, and V* = lOi V for is [n]. As for the quantization, we set 
Ai L = [—4, 4] and A vo = [—1, 7], and we define A& n = A ih x A vo x A Ul x . . . x A Un . 

In both examples, we use uniform quantization functions dividing the domain of 
each state variable x into 2 b equal intervals, where b is the number of bits used by 
AD conversion. The resulting quantizations are Qx F .b — (^ipjA) an d Qe„,& = 
(As n ,rb). Since both examples have two quantized variables, each one with b bits, 
the number of quantized (abstract) states is exactly 2 2b . 

We run QKS and PQKS on the inverted pendulum model If with F = 0.5N 
(force intensity), and on the multi-input buck DC-DC model B n , with n = 5 (number 
of inputs). For the inverted pendulum, we use sampling time T = 0.01 seconds. For 
the multi-input buck, we set T = 10~ 6 seconds. For both systems, we run experiments 
varying the number of bits 6 = 9,10 (also 1 1 for the inverted pendulum) and the number 
of processors (workers) p = 1, 10, 20, 30, 40, 50, 60. Finally, we show how PQKS is 
useful when applied to examples requiring a huge computation time by considering the 
inverted pendulum with 6=13, where control abstraction is computed with p = 1 and 
p = 60. 



Table 1. Experimental Results for inverted pendulum. 





QKS 


PQKS 




6 


CPU Ctrabs 


P 


CPU Ctrabs 


CT IO 


Speedup Efficiency 


CPUK 


9 


8.958e+03 


50 


2.064e+02 


7.696e+02 1.540e+01 


43.399 


86.798 


2.970e+01 


9 


8.958e+03 


60 


1.763e+02 


6.825e+02 1.790e+01 


50.809 


84.681 


2.970e+01 


10 


3.108e+04 


50 


8.527e+02 


3.112e+03 7.330e+01 


36.450 


72.900 


1.131e+02 


10 


3.108e+04 


60 


7.173e+02 


2.170e+03 6.740e+01 


43.331 


72.218 


1.131e+02 


11 


1.147e+05 


50 


3.504e+03 


1.242e+04 2.840e+02 


32.742 


65.485 


1.131e+03 


11 


1.147e+05 


60 


2.938e+03 


6.762e+03 2.842e+02 


39.050 


65.084 


1.131e+03 



Table 2. Experimental Results for multi-input buck DC-DC converter. 
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In order to evaluate effectiveness of our approach, we use the following measures: 
speedup, efficiency, communication time (in seconds) and I/O time (in seconds). The 
speedup of our approach is represented by the serial CPU time divided by the paral- 
lel CPU time, i.e. Speedup = plTaUcPcPU • ^° eva l uate scalability of our approach 
we define the scaling efficiency (or simply efficiency) as the percentage ratio between 
speedup and number of processors p, i.e. Efficiency = s P ee ^ u P %^ \ n Algs. [3] and |4l 
the communication time consists in the time needed by all workers to send their local 
control abstraction to the master. In agreement with Sect. 14.11 the communication time 
is increased by the I/O time, that is the overall time spent by processors in input/output 
activities. 





60 
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Figs. 171 l8ll9land[T0lshow. respectively, the speedup, the scaling efficiency, the com- 
munication time (divided by 1000) and the I/O time of Algs. [3] and [4] as a function of 
p, for the inverted pendulum with 6 = 9, 10, 11. Analogously, Figs. [TTlfT2l [Til and fl4l 
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Fig. 15. Details about inverted pendulum Fig. 16. Details about inverted pendulum 
computation time (30 nodes). computation time (40 nodes). 



show the same measures (except for the fact that communication time is divided by 
10000) for the multi-input buck with 6 = 9, 10. 

We also show the absolute values for the experiments with 50 and 60 processors 
in Tabs. [T]and |2] Tabs. [TJand[2]have common columns. The meaning of such common 
columns is as follows. Column b is the number of bits used for quantization. Column 
QKS (CPU Ctrabs) reports the execution time in seconds needed by QKS to com- 
pute the control abstraction (i.e. Alg. [TJ. Columns PQKS report experimental values 
for PQKS. Namely, column p shows the number of processors, column CPU Ctrabs 
reports the execution time in seconds for Alg. [3] (i.e., the master execution time, since 
it wraps the overall parallel computation), column CT shows the communication time 
(including I/O time), column IO shows the I/O time only, column Speedup reports 
the speedup and column Efficiency reports the scaling efficiency. Finally, column CPU 
K shows the execution time in seconds for the control software generation (i.e., the 
remaining computation of QKS, after the control abstraction generation). 

5.4 Experiments Discussion 

From Figs. 171 andfPTlwe note that the speedup is almost linear, with a 2/3 slope. From 
Figs. l8land[T2lwe note that scaling efficiency remains high when increasing the number 
of processors p. For example, for 6=11 bits, our approach efficiency is in a range from 
75% (10 processors) to 65% (60 processors). In any case, efficiency is always above 
65%. 

Figs.l9land[T3lshow that communication time almost always decreases when p in- 
creases. This is motivated by the fact that, in our MPI implementation, communication 
among nodes takes place mostly when workers send their local control abstractions to 
the master via the shared filesystem. Since in our implementation this happens only 
after an MPI_Barrier (i.e., the parallel computation may proceed only when all nodes 
have reached an MPI_Barrier statement), the communication time also includes waiting 
time for workers which finishes their local computation before the other ones. Thus, if 
all workers need about the same time to complete the local computation, then the com- 
munication time is low. Note that this explains also the discontinuity when passing from 
30 to 40 nodes which may be observed in the figures above. In fact, each worker has 
(almost) the same workload in terms of abstract states number, but some abstract states 
may need more computation time than others (i.e., computation time of function minC- 
trAbsAux in Alg.[2]may have significant variations on different abstract states). If such 
"hard" abstract states are well distributed among workers, communication time is low 
(with higher efficiency), otherwise it is high. Figs. [TBI and [T6l show such phenomenon 
on the inverted pendulum quantized with 9 bits, when the parallel algorithm is executed 
by 30 and 40 workers, respectively. In such figures, the a;-axis represents computation 
time, the y-axis the workers, and "hard" abstract states are represented in red. 

Finally, in order to show feasibility of our approach also on DTLHSs requiring a 
huge computation time to generate the control abstraction, we run PQKS on the inverted 
pendulum with 6 = 13. We estimate the computation time for control abstraction gen- 
eration forp = 1 to be 25 days. On the other hand, with p = 60, we are able to compute 
the control abstraction generation in only 16 hours. 



6 Conclusions and Future Work 



In this paper we presented a map-reduce-style parallel algorithm (and its implemen- 
tation PQKS) for automatic synthesis of correct-by-construction control software for 
discrete time linear hybrid systems, starting from a formal model of the controlled sys- 
tem, safety and liveness requirements and number of bits for analog-to-digital conver- 
sion. Such an algorithm significantly improves performance of an existing standalone 
approach (implemented in the tool QKS), which may require weeks or even months of 
computation when applied to large-size hybrid systems. 

Our results show that our parallel approach on computing the control abstraction 
is effective. In fact, the gain obtained by PQKS with respect to QKS is shown to be 
very high, e.g. with 60 processors PQKS outputs the control software for the 13-bits 
quantized inverted pendulum in about 16 hours, whilest QKS needs about 25 days of 
computation. 

Future work may consist in further improving the communication among processors 
(avoiding usage of the shared filesystem), as well as designing a parallel version for 
other architectures than computer clusters, such as GPGPU architectures. 
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